Prerequiste: Model Driven Telemetry data retrieved from a router, timestamp aligned and merged into a single file (merged.csv). The data is already filtered and only contains numeric counters.
This notebook performs the following steps:
%load_ext autoreload
%autoreload 2
import modules.dataset as ds
from dotenv import load_dotenv
load_dotenv("env")
ds.extract_dataset('./datasets/mdt-demo.tgz', './output')
import modules.mdt.datasets as mdt_ds
datasets = mdt_ds.Datasets(datasets_dir='./output')
datasets.jupyter_select_dataset_device(select_file=False)
Box(children=(Dropdown(description='Dataset:', layout=Layout(display='flex', justify_content='flex-start', wid…
See mdt_data_process notebook for how the merged CSV is curated.
import pandas as pd
import modules.utils as utils
from io import StringIO
merged_data_fn, _ = datasets.get_input_data_file("merged.csv")
df = pd.read_csv(merged_data_fn)
# show number of rows and columns - dimensionality
shape = df.shape
print("dataset dimensions: rows={}, columns={}".format(shape[0], shape[1]))
# display a sample of the dataset, first 10 rows with first 10 columns for each row.
utils.displayDataFrame(df.iloc[0:9,0:9])
dataset dimensions: rows=1079, columns=7334
| ts.V1 | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-good-bytes | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-good-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-multicast-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-bytes | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from1024-to1518 | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from128-to255 | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from1519-to-max |
|---|---|---|---|---|---|---|---|---|
| 1558249381.658611 | 513408648445952 | 121428366854.500000 | 70062.976587 | 513408648445952 | 121428366854.500000 | 16954910447.408417 | 500207085.414062 | 55821925436.250000 |
| 1558249391.658611 | 513493882415104 | 121439268000.000000 | 70063.975488 | 513493882415104 | 121439268000.000000 | 16954912136.550323 | 500306645.896484 | 55831735150.250000 |
| 1558249401.658611 | 513570679283712 | 121449025040.250000 | 70064.000000 | 513570679283712 | 121449025040.250000 | 16954914256.728149 | 500380898.357422 | 55840574311.500000 |
| 1558249411.658611 | 513647466450944 | 121458776629.500000 | 70064.000000 | 513647466450944 | 121458776629.500000 | 16954916430.509275 | 500454012.693359 | 55849411445.250000 |
| 1558249421.658611 | 513724222164992 | 121468531715.750000 | 70064.911685 | 513724222164992 | 121468531715.750000 | 16954918651.233582 | 500528780.119141 | 55858245583.500000 |
| 1558249431.658611 | 513800108363776 | 121478179924.000000 | 70065.000000 | 513800108363776 | 121478179924.000000 | 16954920910.709656 | 500602571.416016 | 55866977679.500000 |
| 1558249441.658611 | 513876775303168 | 121487920800.750000 | 70065.909252 | 513876775303168 | 121487920800.750000 | 16954923308.024719 | 500675612.757812 | 55875800239.250000 |
| 1558249451.658611 | 513952962519040 | 121497604673.250000 | 70066.908253 | 513952962519040 | 121497604673.250000 | 16954925578.743223 | 500749016.750000 | 55884566475.250000 |
| 1558249461.658611 | 514032355700736 | 121507688129.500000 | 70067.000000 | 514032355700736 | 121507688129.500000 | 16954927959.780272 | 500824318.638672 | 55893704392.500000 |
See mdt_data_process notebook for how the processed-offline CSV is curated.
The nature of the network data collected on routers is multi-variate and very heterogeneous in nature. Some counters are incremental (e.g., packet counts), some are percentages (e.g., CPU usage), with ranges varying (e.g., bytes count in the trillions, or booleans that can only be one or zero). An example of incremental data that ranges in the trillions can be found here.
In order to be able to compare information from different sources, preprocessing of the selected dataset include three consecutive steps, operating over the entire timeseries:
preprocessed_data_fn, _ = datasets.get_input_data_file("preprocessed_offline.csv")
df = pd.read_csv(preprocessed_data_fn)
# show number of rows and columns - dimensionality
shape = df.shape
print("dataset dimensions: rows={}, columns={}".format(shape[0], shape[1]))
# display a sample of the dataset, first 10 rows with first 10 columns for each row.
utils.displayDataFrame(df.iloc[0:9,0:9])
dataset dimensions: rows=1079, columns=7334
| ts | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-good-bytes | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-good-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-multicast-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-bytes | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-frames | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from1024-to1518 | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from128-to255 | n0:Cisco-IOS-XR-drivers-media-eth-oper:ethernet-interface_statistics_statistic.csv:HundredGigE0/0/0/0:received-total-octet-frames-from1519-to-max |
|---|---|---|---|---|---|---|---|---|
| 1558249381.658611 | 0.681327 | 0.687531 | 0.504115 | 0.681327 | 0.687531 | 0.451305 | 0.858389 | 0.681585 |
| 1558249391.658611 | 0.681327 | 0.687531 | 0.504115 | 0.681327 | 0.687531 | 0.451305 | 0.858389 | 0.681585 |
| 1558249401.658611 | 0.644663 | 0.648323 | 0.258243 | 0.644663 | 0.648323 | 0.517278 | 0.741312 | 0.644928 |
| 1558249411.658611 | 0.626289 | 0.628532 | 0.129121 | 0.626289 | 0.628532 | 0.558469 | 0.677508 | 0.626523 |
| 1558249421.658611 | 0.616965 | 0.618757 | 0.294610 | 0.616965 | 0.618757 | 0.586249 | 0.653254 | 0.617207 |
| 1558249431.658611 | 0.608525 | 0.610206 | 0.169590 | 0.608525 | 0.610206 | 0.606070 | 0.636611 | 0.608696 |
| 1558249441.658611 | 0.607697 | 0.609107 | 0.314231 | 0.607697 | 0.609107 | 0.637078 | 0.624820 | 0.607856 |
| 1558249451.658611 | 0.605199 | 0.606604 | 0.409198 | 0.605199 | 0.606604 | 0.633205 | 0.620602 | 0.605310 |
| 1558249461.658611 | 0.617882 | 0.619045 | 0.227750 | 0.617882 | 0.619045 | 0.648154 | 0.627273 | 0.618074 |
Detect clusters using DBSCAN and the associated transitions of the system between the clusters.
from modules.mdt.data_utils import load_data, ORIGINAL_DATA
from modules.mdt.changepoint_detector import ChangepointDetector
tstp, dataframe = load_data(preprocessed_data_fn, scale=False, data_selection=ORIGINAL_DATA, ft_regex="^(?!.*(time|second)).*")
detector = ChangepointDetector(dataframe, datasets.get_device())
detector.detect()
detector.plot(withEvents=False)
detector.plot(withEvents=True)
detector.select_changepoints()
Box(children=(Dropdown(description='Changepoint Selection:', layout=Layout(display='flex', justify_content='fl…
The selection problem, i.e., "which of the many features that change are the most descriptive for the change", is approached by optimizing an information-theoretic metric, i.e., cross-entropy. The goal here is to find the subset of features that describes best what is changing at the given timestamp. The intuition is that cross-entropy gives both the amount of additional information in the subset, and the divergence of the subset distribution from the original one. The added regularization term also allows for the tuning of the verbosity of the output.
More details can be found in T. Feltin, J. A. C. Fuertes, F. Brockners and T. H. Clausen, "Understanding Semantics in Feature Selection for Fault Diagnosis in Network Telemetry Data”, NOMS 2023 - 2023 IEEE/IFIP Network Operations and Management Symposium
from modules.mdt.retriever import Retriever
import modules.utils as utils
from IPython.display import clear_output
tstp, dataframe = load_data(merged_data_fn, scale=False, data_selection=ORIGINAL_DATA, ft_regex="^(?!.*(time|second|minute|hour|pid|port)).*",
remove_nan=True, remove_inf=True)
selected_changepoints = detector.get_changepoints()
retriever = Retriever(dataframe)
features = retriever.retrieve(selected_changepoints)
Module modules.mdt.explain_lib, development version Module modules.mdt.selection_lib, development version Running optimisation... -------------------------------------------------- Total features considered: 104 Alpha: 2 -------------------------------------------------- Epoch 1 : Score = 17.16 Epoch 2 : Score = 17.16 Epoch 3 : Score = 17.16
mdt_changepoints = []
for feature, data in features.items():
mdt_changepoints.append({
"Event": f"{feature - tstp[0]}",
"Features": '\n'.join(data),
"Source": "MDT",
'Type': "NETWORK_DEVICE"
})
clear_output()
utils.displayDictionary(mdt_changepoints)
| Event | Features | Source | Type |
|---|---|---|---|
| 4820.0 | Cisco-IOS-XR-ip-bfd-oper:bfd_counters_packet-counters_packet-counter.csv:bfd-mgmt-pkt-display-type-none:HundredGigE0/0/0/16:0/0/CPU0:hello-receive-count CHANGE: 1.0 Cisco-IOS-XR-ip-bfd-oper:bfd_session-briefs_session-brief.csv:172.31.14.48:HundredGigE0/0/0/16:0/0/CPU0:0/0/CPU0:ip-single-hop:status-brief-information__async-interval-multiplier__negotiated-local-transmit-interval CHANGE: 1667000.0 Cisco-IOS-XR-ip-bfd-oper:bfd_session-briefs_session-brief.csv:172.31.14.48:HundredGigE0/0/0/16:0/0/CPU0:0/0/CPU0:ip-single-hop:status-brief-information__async-interval-multiplier__negotiated-remote-transmit-interval CHANGE: 1667000.0 Cisco-IOS-XR-ip-bfd-oper:bfd_summary.csv:::session-state__down-count CHANGE: 1.0 Cisco-IOS-XR-ip-bfd-oper:bfd_summary.csv:::session-state__up-count CHANGE: -1.0 | MDT | NETWORK_DEVICE |
Leverage an LLM to turn the selected set of features along with the amplitude of change into a diagnosis and resolution in natural language.
from modules.diagnose import *
from modules.logger import Logger
from modules.llm.azure_ai import AzureLlm
import logging
import os
logger = Logger(logging.INFO)
llm = AzureLlm(logger,os.getenv('AZURE_OPENAI_API_KEY'))
diagnoser = Diagnose(logger, llm)
diagnoser.setOutputInitialDiagnosis("Diagnosis")
diagnoser.run(mdt_changepoints, inject=True)
utils.displayDictionary(mdt_changepoints)
LLM Endpoint: https://traiage-dev-openai-gpt-35.openai.azure.com/ LLM Prompt: MDT Sensor Path Diagnosis